Create perplexity-lens.md #32
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Brief description of your contribution
Type of Contribution
Checklist
Project Details
What problem does this solve?
Many developers struggle with extracting meaningful, structured data from long-form PDFs—especially ones with inconsistent formatting, multi-column layouts, or embedded tables. This project solves that by providing an offline, lightweight PDF parsing toolkit that can extract structured text, headings (H1-H3), and tables into a clean JSON format.
What makes this contribution valuable to other developers?
This toolkit uses multiple parsing libraries to maximize accuracy and coverage, allowing developers to skip over the tedious process of manually extracting content from PDFs. It is especially useful for researchers, technical writers, and anyone building search or summarization features over document data. Being offline-first makes it reliable for sensitive or restricted environments.
GitHub Repository
Live Demo (View Shared Graph)
External Links (if applicable):
Testing
Smart Text Selection: Selected various text samples on different web pages to check if AI-generated explanations were provided accurately and quickly.
Webpage Summarization: Used the “Summarize” feature on diverse sites (articles, blogs, documentation) to verify concise and relevant summaries.
Retrieval-Augmented Insights (RAG): Hovered and clicked on words/phrases for context retrieval and ensured that RAG-based results were accurate and contextually relevant.
Knowledge Graph Visualization: Added multiple concepts, navigating, zooming, and dragging nodes to confirm the D3.js graph responded smoothly and displayed correct connections.
Public Sharing: Generated and accessed shared graph URLs to validate that public sharing worked as intended, without exposing private data.
Screenshots (if applicable)
Additional Notes